Our project was motivated by the need to understand the demand for and the supply of social services in the United States of America. To narrow this down further, we restricted this study to the state of California, as there is relatively greater access to social services such as WIC and SNAP in this state. In FY 2021, SNAP helped 4.3 million CA residents or 11% of the population, and in FY 2021, WIC helped 950,000 CA residents.
Studies also indicate that many households in California struggle to put food on the table. This statistic indicates the need for further analysis of WIC and SNAP services in California.
Most recent data indicates that:
Policy Questions:
We used data from California Open Data portals, USDA.gov, and census data using tidycensus to conduct our analyses.
We used a combination of tools to answer our motivating policy questions. First, we began by conducting exploratory analyses, including the use of graphs and geospatial visualizations, to further analyze access to WIC and SNAP store locations in different counties in California. We were able to discern trends between different populations and identify the impact of development (urban or rural county) on access to social services. We then used supervised and unsupervised machine learning models (text analysis, cluster analysis, linear regression model) to predict store locations that were likely to have WIC or SNAP and number of vouchers redeemed across counties, based on date and number of families in the region.
The challenges we came across while coding this project were largely related to the data we had access to. We were unable to conduct analysis by race and gender, because of the way these factors were coded into the table. With more time and access to resources, we would love to analyze how women and non-binary folks access these social services. Comparing and contrasting these results in urban counties and rural counties will also be critical to policy making.
What is SNAP?
SNAP is the Supplemental Nutrition Assistance Program. It provides nutrition benefits to supplement the food budget of needy families so they can purchase healthy food and move towards self-sufficiency. It is the largest federal nutrition assistance program and provides benefits to eligible to low-income benefits to individuals and families.
What is WIC?
WIC is the Women, Infants, and Children food assistance program sponsored by USDA for special supplemental nutrition. WIC operates by providing vouchers or electronic benefit transfers (EBT) for WIC approved foods from commercial food retailers.
Average WIC Cost by Participant Analysis (Time Series 2010-2018)
WIC State agencies re-imburse retailers for foods that reflect rates charged by the vendor but up to State regulated thresholds. These prices charged by stores can affect the average State and County’s WIC food costs. Since stores have to become authorized to sell WIC approved food, retailers have immense authority in pricing regulations and adding administrative costs to WIC approved foods.
This exploratory visualization graph supports previously conducted studies in the United States and specifically California on WIC average costs per participant. Average costs per participant throughout the years are “infant” and “breastfeeding mother”. The team hypothesizes that the target demographic of new mothers influences pricing. We assume demand is higher for infant-products like formula and other baby foods. The aforementioned structure of WIC vendors, as price setters could explain the immense discrepancy between “infant” average costs, “breastfeeding mother” average costs as vendors have autonomy to adjust prices to demand.
WIC Redemption Participation Rates by Type (Time Series 2010-2018)
Interestingly the average participation rate is at approximately 60% for children and at under 20% for all other participant categories. We suspect that because WIC does not expire but it does have a BRR or benefit reduction rate that kids in the program continue to benefit from WIC. The ratio of mother to child is also on average higher.
WIC Redemption Participation Rate per Average Cost
This graph analyzes rural and urban counties average costs per participant. Urban is created as a binary variable, hence the indication of 0s and 1s on the x-axis. In both urban and rural counties, infant has the most spread in participation per average costs. Urban counties generally had a longer range that reflected more outliers in average_costs. Rural counties have a higher concentration of costs with less outliers.
Statewide Average Costs by Participant Category
Average Cost by Participant, 0=Rural and 1=Urban
SNAP Number of Retailers per 100,000 People by County 2019
Using exploratory and geospatial data analysis, the team was able to reproduce a geographically accurate map of California per county. It is clear that California has a wealth of SNAP retailers within the state. California’s population, demographics, socio-economic positioning, and the diverse landscape of employment and income provide an interesting picture of this means-tested federal program. The graph shows that rural counties in Northern California has the most retailers per 100k people. The darkly shaded areas on the very west of California are major counties such as San Francisco, San Mateo, Contra Costa, Alameda, and Santa Clara. These major urban areas have less retailers per 100k. We hypothesize that (1) larger retailers are able to provide SNAP benefits and (2) these counties are quite small per square mile
Though the graph indicates that mostly rural areas have more retailers per 100k, the team is unsure how effective those stores are in providing SNAP benefits/ benefits that provide nutritious options for individuals and families.
SNAP Participation Rates by Age Range in 2019 & Time Series Graph (Elderly, Adult, and Child)
SNAP participation rates differ between three separate age groups. Specifically, looking at Imperial County, we can see that the elder rate for SNAP participation is more prominent than the other age range between 2014-2019. Counties that we would like to draw attention to are San Bernadino, Imperial County, and Tulare. We see higher rates across all three age ranges in the Central Valley, which is primarily where agricultural workers live.
Per the participation rate, the primary benefactors of this program are children.
Overall SNAP Participation Rate
Though there are many that are eligible for SNAP not many redeem their benefits. The SNAP redemption rate per county is quite low with a few outliers. However, most counties however around the 55% redemption rate. In the future, the team hopes to analyze why SNAP redemption rates are not at a higher level, what administrative burdens are stopping this, and what policy solutions could be provided to increase SNAP Participation rates.
WIC Retailers per 100,000 People by County
In comparison to SNAP there are less WIC retailers within California. However, similarly to SNAP, we see a congregation of WIC retailers in Northern rural counties. Urban counties like San Francisco and ones aforementioned in the SNAP analysis have less WIC retailers. We hypothesize that it could be for similar reasons/ restrictions listed above in regards to less SNAP retailers in popular urban areas in California.
Machine Learning with WIC- Linear Regression
Using the WIC data, the team looked to see if a model could predict the number of WIC vouchers redeemed, average costs, and number of participants redeemed. We set up the resample across a 10-fold cross-validation to find the fold with the lowest Root Mean Square Error (RMSE). A low RMSE means that the predicted values are close to the real values. Across the folds, fold 7 has the lowest RMSE.
Below are our final predictions:
## # A tibble: 2 × 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 rmse standard 5143. 10 212. Preprocessor1_Model1
## 2 rsq standard 0.992 10 0.000722 Preprocessor1_Model1
## # A tibble: 33,302 × 6
## year month number_of_participants_rede… number_vouchers… average_cost .pred
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2010 1 3698 15460 59.1 13086.
## 2 2010 2 3674 15292 58.3 13036.
## 3 2010 3 3728 15506 58.1 13234.
## 4 2010 4 3625 15232 59.1 12795.
## 5 2010 5 3662 15282 58.7 12944.
## 6 2010 6 3703 15519 58.6 13089.
## 7 2010 7 3713 15474 58.5 13124.
## 8 2010 8 3671 15426 60.0 12873.
## 9 2010 9 3708 15430 59.8 13012.
## 10 2010 10 3703 15308 60.4 12943.
## # … with 33,292 more rows
Cluster Analysis: WIC
Squares that are perfectly white tell us that there is no correlation between cost and year, nor cost and participation type. Indicating that costs and participation type is consistent across the years and that external factors do not impact this. Most importantly, it is telling that average cost is consistent across years. This could inform policy on WIC benefits, ensuring that benefits meet costs and the potential expansion of the program.
Number of participants and number of vouchers are perfectly positively correlated, indicating that there is not a difference across people in the number of vouchers that participants receive. We do see some slightly negative correlations between participation type, number of participants, and number of vouchers. The assumption is that as number of participants and number of vouchers increase, participant type would decrease. This could inform how food policy is targeted towards participants and how to alleviate potential administrative burdens for people redeeming vouchers.
WIC PCA: Optimal Number of Clusters
The team used unsupervised machine learning to create an analytical measure that determined the optimal number of clusters for our analysis. We used three methods: silhouette, gap_stat and wss. Respectively, the optimal number of clusters was 2, 6, and 2.
Using a cluster k-means of 6, and PCA the team was able to plot both PC1 and PC2. This cluster analysis supports our previous findings that the categories are similar in terms of average_cost with the exception of a few outliers, specifically in urban areas. In rural areas, participant category had a wider range as supported by this WIC PCA.
## year n_part n_vouchers average_cost part_type
## year 1.000000e+00 -0.13902219 -0.1380343 0.009404715 -3.218411e-21
## n_part -1.390222e-01 1.00000000 0.9934968 -0.065260541 -3.320404e-01
## n_vouchers -1.380343e-01 0.99349678 1.0000000 -0.160118546 -3.460947e-01
## average_cost 9.404715e-03 -0.06526054 -0.1601185 1.000000000 -8.351521e-02
## part_type -3.218411e-21 -0.33204042 -0.3460947 -0.083515208 1.000000e+00
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 1.4919 1.0353 0.9914 0.8472 0.03944
## Proportion of Variance 0.4452 0.2144 0.1966 0.1436 0.00031
## Cumulative Proportion 0.4452 0.6596 0.8561 0.9997 1.00000
## PC1 PC2 PC3 PC4 PC5
## year 0.1475834 -0.28036319 0.91421762 -0.2526286 0.0007505649
## n_part -0.6465852 0.01275225 0.02385580 -0.3076241 -0.6975408768
## n_vouchers -0.6528909 0.07800267 0.06358958 -0.2357423 0.7127629379
## average_cost 0.0967908 -0.81097916 -0.38223385 -0.4264704 0.0704600924
## part_type 0.3528623 0.50740738 -0.11614946 -0.7772353 0.0209888458
## # A tibble: 6 × 8
## year n_part n_vouchers average_cost part_type size withinss cluster
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <fct>
## 1 2.06e-17 -0.637 -0.530 -0.196 -1.41 108 109. 1
## 2 9.67e- 1 1.45 1.46 -0.569 -0.706 48 14.6 2
## 3 -7.74e- 1 2.26 2.31 -0.582 -0.706 60 21.0 3
## 4 -7.74e- 1 -0.554 -0.520 -0.584 1.06 120 61.3 4
## 5 2.06e-17 -0.0535 -0.263 1.95 0 108 111. 5
## 6 9.67e- 1 -0.668 -0.629 -0.599 1.06 96 36.1 6
## # A tibble: 11 × 3
## participant_category cluster n
## <chr> <int> <int>
## 1 Breastfeeding Mother 4 87
## 2 Breastfeeding Mother 6 21
## 3 Child 1 56
## 4 Child 2 25
## 5 Child 3 27
## 6 Infant 4 3
## 7 Infant 5 105
## 8 Non-Breastfeeding Mother 6 108
## 9 Prenatal 4 63
## 10 Prenatal 5 40
## 11 Prenatal 6 5
The text analysis word jumble is extremely interesting and tells us the different combinations of words that make up the store locations for WIC across different counties in California. For example, we see Joes-Trader-Joe.
Largest Urban Counties
The facet wrap graphs tell us important store locations that are in the top 6 urban areas in California: Los Angeles, Orange County, Riverside, San Bernardino, San Diego, Santa Clara.
In the largest urban counties, we see quite a few gas stations listed as the most common WIC vendors. To name a few gas stations, Rotten Robbie, Quik Stop, 7 Eleven, and United Oil are all in the top vendors for urban counties. We also found several convenience stores, like Mini Market in LA, in the top vendors. We find it interesting that, in urban areas, there is such a large prevalence of gas stations and convenience stores listed. We would have thought urban areas would have access to more traditional food stores and markets.
Largest Rural and Suburban Counties
This graph tells us the top locations in the largest rural and suburban counties. We see here that Fastrip Food, MD Liquor, and Save Liquors are stores that are popular for WIC and SNAP. This is particularly relevant for our policy, as it tells us the type of stores that offer these services. It is also worthy to note that liquor stores are highly popular for social services.
Compared to the other counties in this section, Monterey stands out as it had a lot more variety in vendors, including small businesses like Esperanza Market and larger chains like the 7 Eleven (Chain 2367).
Smallest Rural Counties
Similarly, in smaller rural locations, Yosemite Liquor and gas stations seem to be popular store locations for WIC. This data could be useful in informing policy decisions about accessibility of social services and the kinds of stores that opt-in to participate in WIC.
It’s also further interesting to note that there are also only a couple of these stores in three of the small counties selected, but are displayed as top locations in these counties. This may warrant further research on access to social services in rural California.
One county, Alpine, does not even appear in our analysis because it does not have enough vendors to meet our minimum threshold for bigrams, n=30. Additionally, we would like to do further analysis of Mariposa County because some of the top vendors are affiliated with Yosemite National Park, and therefore tourism could be misrepresenting the access to WIC vendors that local residents actually have if they do not live within close proximity to Yosemite.
Conclusion
The team is excited about this data! We believe that understanding the nuances of the SNAP and WIC program can help inform policy in how to increase redemption rates and how to get nutritious food to people in need. The restrictions of our data set did limit us in understanding how demographics and labor statistics in CA impact redemption rates or how race impacts what types of SNAP or WIC retailers are available. We look forward to expanding research on this data.